Word Spaces as Input to Categorisation of Attitude

نویسنده

Jussi Karlgren

چکیده

SICS starting points are that given a semantic word space trained on general purpose text, where distance and nearness are measures of semantic similarity, we can represent sentences by the centroid of the words that occur in it, that constructional features contribute to the organisation of this semantic space, and attitude is a semantic dimension of variation in that sentences with similar attitudinal qualities can be expected to occupy space in the vicinity of each other. This year’s simplistic experiment did not yield useful results. Parameter tuning is a necessary step in any categorisation excercise; this year we failed to devote the necessary effort to achieve results worth noting. Word spaces for opinion analysis This paper describes briefly the SICS attempt to participate in NTCIR-8 [6]. We have in previous experiments, among them ones performed in NTCIR-7, successfully used constructional features in conjunction with lexical features in a word space, achieving high recall for attitudinal utterances even in cases where the lexical features alone would have yield equivocal evidence on the utterance character[3, 2]. Our approach takes as its starting point the observation that lexical resources always are noisy, out of date, and most often suffer simultaneously from being both too specific and too general. Not only are lexical resources inherently somewhat unreliable or costly to maintain, but they do not cover all the possibilites of expression afforded by human linguistic behaviour: we believe that attitudinal expression in text is not solely a lexical issue. For our present experiments reported here no attitudinal lexical resources were used — only general purpose linguistic analysis was employed to establish the constructions used in the further processes. A basis for our approach is the Word Space Model[5, 4], a data structure based on a general multi-dimensional vector space model, where distance and nearness are used as estimates of semantic similarity, where those distances are computed from distributional data collected from sizeable Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. amounts of general purpose text, and where computation of similarity is made using geometric computations in a multidimensional space. Our starting points are that given a word space to represent semantic relations between terms, we represent sentences by the centroid of the words that occur in it but we add constructional features to contribute to the organisation of this semantic space, and posit that attitude is a semantic dimension of variation in that sentences with similar attitudinal qualities can be expected to occupy space in the vicinity of each other. This worked well for NTCIR-7, given that we put some fair effort into parameter tuning and selecting the most appropriate background text collection. NTCIR 8 MOAT experiment This year’s experiment was performed as simply as possible, without new parameter tuning, as a simplified version of the more successful experiment performed the year before. This proved insufficient — we were not able to regain the same level of accuracy as we did in previous and other similar experiments[2] where we put more time into tuning the mechanisms for the corpus at hand. 1. We built a background semantic word space using random indexing from several years of newsprint material. 2. We transformed both the training set and the test set by surface syntactic analysis as described in our previous reports, including the attitude tag for the training set. 3. We projected the training and the test set, sentence by sentence, into the background space. 4. We exported the context vectors of the centroids of the training and test sets. 5. We used liblinear[1] to categorise the test set based on the training set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections

Assessment is considered as one of the fundamental elements in the field of foreign language acquisition. In order for communication take place, adequate number of vocabulary is needed to be known by the learners. The salient role of vocabulary in the field of foreign language acquisition resulted in the publication of several hundreds of papers and dozens of books. Due to the dominant role of ...

متن کامل

Level of categorisation effect: A novel effect in the picture-word interference paradigm

In four experiments we explored the effects of two variables in the pictureword interference paradigm: semantic relatedness and the level of categorisation of distractors relative to pictures’ names. Experiment 1 addressed whether the contrasting effects of semantically related distractors in categoryand basic-level naming have a methodological origin (i.e., differences in the number of respons...

متن کامل

Schemata-Building Role of Teaching Word History in Developing Reading Comprehension Ability

Methodologically, vocabulary instruction has faced significant ups and downs during the history of language education; sometimes integrated with the other elements of language network, other times tackled as a separate component. Among many variables supposedly affecting vocabulary achievement, the role of teaching word history, as a schemata-building strategy, in developing reading comprehensi...

متن کامل

Analytic Attitude toward the Contemporary Urban Public Spaces Related to the Communal Rituals

متن کامل

The use of lexical knowledge in phonetic categorisation

Lexical effects on phonetic categorisation have been taken as evidence that the listener's word knowledge inßuences phonetic processing during normal speech perception. Tbe present study examined word-nonword effects in the categorisation of word-initial and wordfinal stop consonants. Natural speech was edited to produce bilabial, alveolar and velar voicing continua. Tbe data revealed a signifi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Word Spaces as Input to Categorisation of Attitude

نویسنده

چکیده

منابع مشابه

The Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections

Level of categorisation effect: A novel effect in the picture-word interference paradigm

Schemata-Building Role of Teaching Word History in Developing Reading Comprehension Ability

Analytic Attitude toward the Contemporary Urban Public Spaces Related to the Communal Rituals

The use of lexical knowledge in phonetic categorisation

عنوان ژورنال:

اشتراک گذاری